Skip to content

feat(perception): monocular obstacle avoidance via optical flow 1775#1908

Draft
jhengyilin wants to merge 5 commits intodimensionalOS:devfrom
jhengyilin:feature/optical-flow-obstacle-avoidance-1775
Draft

feat(perception): monocular obstacle avoidance via optical flow 1775#1908
jhengyilin wants to merge 5 commits intodimensionalOS:devfrom
jhengyilin:feature/optical-flow-obstacle-avoidance-1775

Conversation

@jhengyilin
Copy link
Copy Markdown

@jhengyilin jhengyilin commented Apr 23, 2026

feat(perception): monocular obstacle avoidance via optical flow τ-estimation

Branch: feature/optical-flow-obstacle-avoidance-1775main


Summary

Implements OpticalFlowModule — real-time Time-to-Contact (τ) estimation from
sparse optical flow on a single RGB camera. No depth sensor, no calibration, no
GPU. Designed as a complement to LiDAR proximity sensing, not a replacement.
Closes

Answers to issue

"Can optical flow be used for real-time obstacle avoidance?"
Yes, as a complement to LiDAR — not a replacement. It detects fast-approaching
obstacles via τ = 1/divergence with 93.9% precision when it fires, but recall
is 0.295, so it only catches ~30% of danger frames. It is blind to stationary
obstacles. Recommended: pair with LiDAR for static proximity; use optical flow
for dynamic and fast-approaching obstacles.

"How much does optical flow correlate with LiDAR?"
Weakly. Spearman r = +0.173 — correct direction but too noisy for a calibrated
timer. Use as a ranked urgency signal, not a precise TTC.

"What keypoint strategies? What does ORB-SLAM use?"
ORB-SLAM uses Oriented FAST for rotation-invariant BRIEF descriptor matching.
That orientation step is irrelevant for Lucas-Kanade, which tracks image patches
directly and has no orientation input. Plain FAST is the correct choice here.

Problem

The Go2 uses LiDAR for obstacle proximity but cannot answer "how fast is it
approaching?" This adds a camera-based early-warning signal for fast-approaching
or dynamic obstacles, complementing LiDAR rather than replacing it.

What Was Built

dimos/perception/optical_flow/
├── __init__.py
├── tac_estimator.py           ← divergence → τ math 
├── optical_flow_module.py     ← dimos Module: camera in, signals out
└── backends/lucas_kanade.py   ← FAST corners + Lucas-Kanade tracking

scripts/eval_optical_flow_gated.py   ← odom-gated evaluation vs LiDAR
scripts/visualize_optical_flow.py    ← live Rerun viewer (annotated frames + τ plot)
RESULTS.md                           ← theory, methodology, full results

Pipeline at a glance:

flowchart LR
    A["Camera RGB frame"] --> B["FAST keypoint detection"]
    B --> C["Lucas-Kanade tracking"]
    C --> D["Flow vectors (dx, dy)"]
    D --> E["5×5 grid divergence"]
    E --> F["τ = 1 / divergence"]
    F --> G{"τ < threshold?"}
    G -- yes --> H["DANGER signal"]
    G -- no --> I["CLEAR"]
    J["IMU ω (yaw rate)"] --> K{"|ω| > 0.3?"}
    K -- yes --> L["GATED — suppress danger"]
    K -- no --> G
Loading

Module Interface

Stream Direction Type Description
color_image In Image RGB/BGR camera feed
angular_velocity In Any Yaw rate rad/s — optional
danger_signal Out Bool True when min τ < threshold AND not turning
tac_grid Out Any 5×5 float array of per-cell τ values
flow_visualization Out Image Debug overlay: flow vectors + grid

When angular_velocity is connected and |ω| > omega_max (default 0.3 rad/s),
danger_signal is forced False. If unused, _last_omega stays 0 and the gate
is transparent — backward compatible.

Evaluation Results

Latency: ~2.5 ms/frame

τ Correlation with LiDAR TTC (82 odom-gated frames: v_fwd > 0.15 m/s, |ω| < 0.15 rad/s)

Pearson r Spearman r RMSE
+0.144 +0.173 3.31

Correct sign — closer obstacle → smaller τ — but magnitude is noisy; use τ as a
ranked urgency signal, not a precise timer.

Binary Detection (GT: lidar_dist < 1.5 m, GT positive rate: 84%)

τ threshold Precision Recall F1
3.0 (default) 0.939 0.295 0.449
5.8 (max F1) 0.918 0.533 0.675

At τ=3.0, 93.9% of alarms are confirmed by LiDAR. Tune tau_threshold per use-case.

Live visualization. scripts/visualize_optical_flow.py replays
unitree_office_walk through the same backend and streams annotated frames,
the tau grid, and a min_tau time-series into a Rerun viewer — scrub any
frame to see which cell fired and why.

Known Limitations

Limitation Consequence Mitigation
Stationary obstacles produce no flow Blind to static walls at constant distance LiDAR handles static proximity
τ value noisy (RMSE ≈ 3 s) Not a precise collision timer Ranked alarm; wire tac_grid to planner
Low-texture surfaces yield few keypoints Reduced grid coverage Edge-enhancing preprocessing

Files Changed

New — production code: dimos/perception/optical_flow/ (5 files)
New — evaluation: scripts/eval_optical_flow_gated.py, RESULTS.md
New — demo: scripts/visualize_optical_flow.py (Rerun viewer)
No existing files modified.

  Monocular Time-to-Contact detection via FAST keypoints + Lucas-Kanade flow.
  Follows dimos Config pattern; rotation-gated via optional angular_velocity
  stream. Closes dimensionalOS#1775.
  Eval: P=0.939 at τ=3.0 on unitree_office_walk. Visualization streams annotated frames +
  τ timeline to Rerun.
@leshy
Copy link
Copy Markdown
Contributor

leshy commented Apr 24, 2026

can you elaborate on why a mobile robot cannot avoid stationary obstacles with optical flow?

how can I test this? how did you test this? for example,

"How much does optical flow correlate with LiDAR?"
Weakly. Spearman r = +0.173 — correct direction but too noisy for a calibrated

can you show me how to reproduce this number?

@jhengyilin
Copy link
Copy Markdown
Author

Testing

I evaluated the module offline against LiDAR ground truth on the unitree_office_walk dataset (51s indoor sequence). The eval script is at scripts/eval_optical_flow_gated.py — it runs every frame through the optical flow backend, filters for frames where the robot is moving forward (speed > 0.15 m/s) and not turning (|ω| < 0.15 rad/s) to isolate pure translation, then uses the LiDAR distance on those frames as ground truth to test optical flow against. It computes:

  • Correlation — Spearman r = +0.173 between optical flow τ and LiDAR TTC
  • Binary detection — Precision = 0.939, Recall = 0.295 at τ=3.0 (we take Ground Truth as LiDAR distance < 1.5m)
  • Latency — ~2.5 ms/frame

To reproduce, first place the unitree_office_walk dataset folder under data/, then run from repo root:

Evaluation:

PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
    uv run python3 scripts/eval_optical_flow_gated.py

Visualization (opens a Rerun viewer where you can scrub the timeline and see flow arrows, τ grid, and DANGER/CLEAR labels frame-by-frame):

PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
    uv run python3 scripts/visualize_optical_flow.py

Results land in results/optical_flow_gated.json. Full methodology and numbers are in RESULTS.md.


Stationary obstacles and optical flow

A mobile robot can avoid stationary obstacles with optical flow — as long as two conditions are met:

  1. Relative motion exists — the robot is moving toward the obstacle. The robot's own forward motion produces an expanding flow field in the image (pixels diverge outward from the focus of expansion), which gives a finite τ = 1/divergence. The obstacle doesn't need to move on its own.

  2. The obstacle has trackable texture — FAST needs to detect keypoints on the obstacle's surface (edges, corners, texture). A featureless surface may yield too few keypoints for a reliable divergence estimate in that grid cell.

The only case where optical flow truly can't help is when there is zero relative motion between the camera and the obstacle — e.g., the robot is stationary, or moving parallel to a wall at constant distance. No pixel displacement means zero divergence and no alarm. That's the specific case where LiDAR is still needed for proximity sensing.

@leshy
Copy link
Copy Markdown
Contributor

leshy commented Apr 24, 2026

  • you need to understand all code written and all decisions made.
  • you can do visual inspection, you don't need to do fancy LIDAR corelations, should be obvious if optical flow is detecting obstacles that are visible on screen or not
  • if doing fancy LIDAR corelation - make sure it's actually correct - so visualize expected obstacles on camera image (according to lidar) small timestamp differences between odom and video, the way you take into account camera lens distortions are critical (but this is not neccessary to do at all for the first pass, just if doing this, make sure it's correct, it's better to have no data then wrong data)
  • do we need this grid at all? I think there should be nicer algo that gives you a gradiant of rate of divergence of keypoints across an image? it's ok if not easy etc, just asking as I assume there is such a thing
  • keypoint detection algo used is heavy and sparse, you want light (easy to compute) and non-sparse keypoints. idk if this is a diff setting for your existing keypoint algo, or entirely differnet algo

…danger on connected-components

  Addresses review feedback on dimensionalOS#1775:
  - FAST corners → uniform meshgrid (light, non-sparse keypoints)
  - per-point FOE τ → divergence-on-grid τ
  - min(τ) alarm → connected-components on thresholded divergence
  - LK reconstruction-error filter on per-point quality
  - LiDAR-correlation evaluation removed
@jhengyilin
Copy link
Copy Markdown
Author

jhengyilin commented Apr 26, 2026

Optical flow obstacle avoidance

τ (time to contact) math

We use divergence (∂u/∂x + ∂v/∂y, computed via np.gradient on the flow field) and τ = 2 / divergence as the backend math to calculate time to contact, this approach is more robust vs the foe (focus of expansion) approach (it will break with movement around foe which gives small flow vectors).

Keypoint detector

We originally conisder FAST keypoint detector, however it is design for SLAM use (which require keypoint and descripter) and relative expensive to compute, futhermore, we need flow at every image location, including the smooth surfaces (for example, wall or some big single texture obstacle) which FAST can hardly find. Uniform grid is content-independent and constant-compute. More suitable for what we want "light + non-sparse keypoints".

Alarm

We originally go for alarming on min(τ) — single noisy cells could trip DANGER. This iteration switched to connected components on the thresholded divergence map; alarm only fires on coherent blobs (~80×80 px minimum).

Pipeline

flowchart TD
    F["Frame N-1 + Frame N"] --> LK["Lucas-Kanade on uniform grid points"]
    LK --> ERR["Drop tracks with high LK matching error"]
    ERR --> UV["raw u, v per surviving point"]

    UV --> GS["Gaussian smooth on flow field (u, v)"]
    GS -- "np.gradient to get du/dx and dv/dy" --> SOB["Per-cell divergence = (du/dx + dv/dy)"]
    SOB -- "tau = 2 / divergence " --> MS["Median smooth on divergence / tau map"]

    UV -. "raw flow" .-> A1["Arrow GEOMETRY: direction and length"]
    MS -. "smoothed tau" .-> A2["Arrow COLOR: red, yellow, green by tau band"]
    A1 --> VIZ["flow_visualization: Out Image"]
    A2 --> VIZ

    MS --> THR["Threshold mask: divergence above limit"]
    THR --> CC["Connected components on mask"]
    CC --> CHK{"Largest blob area at least 15 cells?"}
    CHK -- yes --> DT["danger_signal: Out Bool = TRUE"]
    CHK -- no --> DF["danger_signal: Out Bool = FALSE"]

   
Loading

In the GIFs below:

  • Arrow direction and length = the raw (u, v) per tracked point — what LK actually returned for that frame, noise included.
  • Arrow color (red / yellow / green) = the τ band from the smoothed divergence map — reflects the algorithm's per-cell danger reading after Gaussian + median smoothing. Red = τ < 3 frames, yellow = τ < 6, green = safe / not approaching.
  • DANGER / CLEAR label in the corner = output of the connected-components alarm on the thresholded smoothed divergence map. Alarm only if a coherent low-τ blob exceeds ~15 cells (~80×80 px). Filter out single or small noisy blob cells.

When this module is deployed, downstream DimOS consumers subscribe to:

  • danger_signal: Out[Bool] — the safety signal.
  • flow_data: Out[Any] — per-point (x, y, τ, u, v) array for any custom downstream consumer/application.
  • flow_visualization: Out[Image] — the annotated frame.

Demo

uv run python3 scripts/run_on_video.py [video_source]

Outdoor walk that ends at a dumpster - starts CLEAR as the dumpster is still distant, fires DANGER once it comes into range, briefly drops back to CLEAR as the camera reorients across the parking lot, then fires DANGER again at near-contact when the dumpster fills most of the frame:

IMG_4362_dumpster

Indoor scene — same algorithm, walking around a workbench with multiple obstacles (robotic arm, ArUco fixtures, cup):

IMG_4358_workbench

Module integration

OpticalFlowModule follow the standard dimos perception module practice (In[Image] for camera, Out[Bool]/Out[Any]/Out[Image] for danger / flow data / annotated vizualize img). Production deployment is via existing dimos stack (blueprint + dimos run).

Running the full framework integration on macOS need extra configuration/integration, and somehow out of scope for this issue task #1775 which is about explore optical flow's feasability to do obstacle avoidance using mono camera video feed. To run the pipeline without extra setup, scripts/run_on_video.py calls the backend directly on video stream — same backend code, no framework needed.

Notes

  • Flow field is naturally noisy (LK tracking errors, single-frame jitter). Three layers of noise handling: Gaussian smooth on the flow field before differentiating, median smooth on the divergence map after, and per-point LK reconstruction-error threshold drops bad tracks before they enter divergence at all.
  • Smooth uniform surfaces (the red Solo cup, a plain wood desk) sometimes don't fire even at near-contact. LK needs features to track and smooth surfaces have few.
  • iPhone is 30fps; downsampled by stride=2 to ~15fps to match go2's fps.
  • Rotation breaks both LK's brightness-constancy / small-displacement assumptions and the forward-translation assumption behind τ = 2 / divergence. Module exposes angular_velocity: In[Any] and supress danger signal when |ω| > 0.3 rad/s (omega_max).
  • Optical flow is an assumption-heavy approach (brightness constancy, sufficient texture to track, near-translational motion) — when assumptions break, the output degrades. So the approach is best as a cheap mono-camera second-opinion / fallback signal, not a primary obstacle avoidance guidance — Go2 already has LiDAR for that. Where this approach is useful is for platforms without depth sensors. Some future upgrade thoughs: dense flow (Farneback), learned flow (RAFT), or fusion with IMU / depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants